Statistical and computational tradeoffs in biclustering
نویسندگان
چکیده
We consider the problem of identifying a small sub-matrix of activation in a large noisy matrix. We establish the minimax rate for the problem by showing tight (up to constants) upper and lower bounds on the signal strength needed to identify the sub-matrix. We consider several natural computationally tractable procedures and show that under most parameter scalings they are unable to identify the sub-matrix at the minimax signal strength. While we are unable to directly establish the computational hardness of the problem at the minimax signal strength we discuss connections to some known NP-hard problems and their approximation algorithms.
منابع مشابه
Gene co-expression networks via biclustering Differential gene co-expression networks via Bayesian biclustering models
Identifying latent structure in large data matrices is essential for exploring biological processes. Here, we consider recovering gene co-expression networks from gene expression data, where each network encodes relationships between genes that are locally co-regulated by shared biological mechanisms. To do this, we develop a Bayesian statistical model for biclustering to infer subsets of co-re...
متن کاملContext Specific and Differential Gene Co-expression Networks via Bayesian Biclustering
Identifying latent structure in high-dimensional genomic data is essential for exploring biological processes. Here, we consider recovering gene co-expression networks from gene expression data, where each network encodes relationships between genes that are co-regulated by shared biological mechanisms. To do this, we develop a Bayesian statistical model for biclustering to infer subsets of co-...
متن کاملDifferential gene co-expression networks via Bayesian biclustering models
Identifying latent structure in large data matrices is essential for exploring biological processes. Here, we consider recovering gene co-expression networks from gene expression data, where each network encodes relationships between genes that are locally co-regulated by shared biological mechanisms. To do this, we develop a Bayesian statistical model for biclustering to infer subsets of co-re...
متن کاملDiscovering Relevance-Dependent Bicluster Structure from Relational Data
In this paper, we propose a statistical model for relevance-dependent biclustering to analyze relational data. The proposed model factorizes relational data into bicluster structure with two features: (1) each object in a cluster has a relevance value, which indicates how strongly the object relates to the cluster and (2) all clusters are related to at least one dense block. These features simp...
متن کاملApplying Biclustering to understand the molecular basis of phenotypic diversity
High-throughput techniques, such as DNA microarrays, that are used in gene expression measurements offer a unique and global insight into the molecular mechanisms of a living cell. Computational resources are fundamental in order to extract biological interpretable information and deal with the big amount of the data extracted from these techniques. Statistical analysis of microarray data is a ...
متن کامل